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An  Information  Comparison  of  Conventional  and  Adaptive 
Tests  in  the  Measurement  of  Classroom  Achievement 


Achievement  testing  consists  of  mapping  an  Individual's  proficiency  level 
onto  an  observable  Indicator  of  proficiency.  This  mapping  is  accomplished  by 
means  of  a testing  procedure.  Two  of  the  characteristics  defining  a testing 
procedure  (Sympson,  1975)  are  the  nature  of  the  items  in  the  test  and  the  way 
In  which  the  test  Items  are  administered.  Both  of  these  characteristics  are 
potentially  Important  factors  In  determining  how  accurately  the  observable  in- 
dicator will  reflect  the  individual's  underlying  proficiency  level. 

Given  an  item  type,  there  are  basically  two  ways  of  administering  a test 
— individually  or  in  groups.  In  group  testing  everyone  answers  the  same  set 
of  test  items;  in  individualized  or  adaptive  testing  everyone  receives  a 
different  set  of  items,  and  the  difficulty  of  a test  is  dynamically  tailored 
to  the  ability  level  of  the  testee.  The  psychometric  advantages  and  disadvan- 
tages of  these  two  modes  of  administration  have  been  the  subject  of  research 
in  recent  years  (Weiss,  1976;  Weiss  & Betz,  1973).  Results  of  this  research 
suggest  that  adaptive  testing  is  superior  to  group  (conventional)  testing  in 
terms  of  precision  of  measurement  (McBride  & Weiss,  1976;  Vale  & Weiss,  1975b; 
Weiss,  1976),  test-taking  motivation  (Betz  & Weiss,  1976),  and  potential  to 
eliminate  bias  (Pine  & Weiss,  1976). 

Virtually  all  of  this  research  is  based  on  ability  measurement  rather 
than  achievement  measurement.  The  question  which  arises,  therefore,  is  whether 
or  not  similar  benefits  would  accrue  in  achievement  testing.  Since  achievement 
testing  can  be  conceptualized  in  several  ways  (Green,  1974),  however,  a general 
answer  to  this  question  may  not  be  possible.  For  example,  mastery  testing 
(Block,  1971)  is  an  approach  to  achievement  testing  which  is  currently  receiv- 
ing attention  from  both  practitioners  and  theoreticians.  The  purpose  of 
mastery  testing  is  to  classify  individuals  into  two  states:  mastery  and  non- 
mastery. Because  of  the  instructional  philosophy  behind  mastery  testing,  there 
is  likely  to  be  a lack  of  variability  in  performance  at  the  time  of  testing  on 
a given  instructional  unit;  and  as  a result,  it  becomes  profitable  to  tailor 
the  length  of  a test  rather  than  its  difficulty.  Ferguson  (1969)  has  demon- 
strated the  feasibility  of  implementing  such  a testing  system. 

However,  when  instruction  is  likely  to  result  in  substantial  variation 
with  respect  to  achievement  in  the  population  being  tested,  the  procedures 
for  adaptive  ability  testing  become  relevant  for  achievement  testing,  provided 
that  the  same  response  models  which  apply  in  ability  testing  are  also  appli- 
cable in  the  measurement  of  achievement.  In  a previous  report  Bejar,  Weiss, 
and  Kingsbury  (1977)  established  the  plausibility  of  that  assumption  in  a 
college  instructional  setting.  The  purpose  of  this  study  is  to  investigate  in 
that  same  setting  the  performance  of  an  adaptive  testing  model  designed  for 
ability  measurement  in  comparison  to  classroom  examinations  covering  the  same 
course  content . 
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Comparlng  testing  procedures  Is  difficult  (Sympson,  1975)  since  diff- 
erent procedures  usually  differ  in  more  than  one  respect.  Comparisons 
between  testing  procedures  are  further  complicated  by  the  criteria  for 
evaluation  (Weiss  & Betz,  1973).  Reliability  and  correlational  Indices 
have  been  used  to  compare  testing  procedures  in  many  live  data  investiga- 
tions (e.g.,  Betz  & Weiss,  1975;  Vale  & Weiss,  1975a)  and  in  some  simula- 
tion investigations  (e.g.,  Jensema,  1976,  pp.  82-89).  Such  comparisons  .ire 
less  than  optimal.  By  summarizing  all  the  data  in  one  single  value, 
important  information  is  likely  to  be  lost  (Samejlma,  1977). 

A more  appropriate  evaluative  criterion  for  comparing  testing  procedures 
is  psychometric  information.  Unlike  reliability  and  correlational  indices, 
information  is  an  index  of  the  precision  of  measurement  at  all  levels  of  the 
trait  being  measured.  Information  functions  are  particularly  useful  in 
comparing  test  models  analytically.  Bejar  (1975)  used  information  functions 
to  compare  the  dichotomous,  graded,  and  continuous  response  models;  Hambleton 
and  Traub  (1971)  used  them  to  compare  several  logistic  test  models.  Because 
the  comparison  was  among  models  in  these  cases,  the  use  of  information  func- 
tions was  appropriate. 

The  comparison  of  the  same  model  under  two  modes  of  administration 
(conventional  and  adaptive)  is  of  Interest  in  research  on  adaptive  testing. 

In  this  research  (e.g.,  McBride  & Weiss,  1976;  Vale  & Weiss,  1975a)  informa- 
tion functions  have  been  computed  by  monte  carlo  procedures.  The  relative 
efficiency  of  the  two  modes  of  test  administration  has  then  been  determined 
by  the  ratio  of  the  information  functions.  The  results  of  such  comparisons, 
however,  are  theoretical  predictions  which  should  be  verified  empirically. 

Research  comparing  conventional  and  adaptive  testing,  using  information 
as  the  evaluation  criterion,  has  been  based  almost  exclusively  on  monte 
carlo  simulated  data.  These  simulation  studies  suggest  that  adaptive  test- 
ing yields  more  precise  scores  than  conventional  testing;  they  are  not  entirely 
generallzable,  however,  since  they  are  based  on  data  that  fit  the  model 
perfectly.  There  has  been  only  one  study  based  on  data  from  live  testees 
which  used  information  as  an  evaluative  criterion  (Brown  & Weiss,  1977); 
however,  it  was  a real-data  simulation  study  (Weiss  & Betz,  1973,  pp.  11-12) 
which  did  not  involve  the  actual  adaptive  administration  of  test  items  to 
testees. 

Purpose 

The  major  aim  of  the  present  investigation  was  to  compare  an  adaptive 
achievement  test  to  a conventionally-administered  classroom  test,  using 
information  as  the  evaluative  criteria.  In  contrast  to  previous  investiga- 
tions, the  measure  of  information  used  was  derived  from  live  test  admin- 
istration of  both  the  adaptive  and  conventional  tests.  Because  classroom 
examinations  are  seldom  designed  to  be  psychometrlcally  optimal,  the  adaptive 
test  was  also  compared  to  an  improved  conventional  test  which  was  constructed 
from  the  same  item  pool.  In  addition,  the  data  provided  an  opportunity  to 
study  the  effects  of  expansion  of  the  adaptive  test  item  pool  on  its  informa- 
tion characteristics. 
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Method 


Data  for  this  study  were  obtained  from  students  enrolled  In  a large 
introductory  Biology  course  at  the  University  of  Minnesota  (see  Bejar  et  al, 
1977).  Two  midquarter  examinations  and  a final  examination  are  administered 
in  the  course.  Although  each  midquarter  examination  covers  several  content 
areas,  a single  dimension  has  been  shown  to  account  for  performance  on  the 
examinations  (Bejar  et  al,  1977).  In  addition  to  the  classroom  examinations, 
volunteers  completed  two  computer-administered  adaptive  tests  which  covered 
the  same  content  as  the  midquarter  examinations.  The  data  analyzed  for  each 
student  consisted  of  scores  on  the  two  classroom  midquarter  examinations  and 
the  corresponding  scores  on  the  first  and  second  midquarter  adaptive  tests. 
The  results  are  based  on  a comparison  of  the  levels  of  information  associated 
with  these  scores. 

Subjects 

Volunteers  were  recruited  during  the  fall  and  winter  quarters  of  the 
1976-77  academic  year.  Each  quarter  an  information  sheet  was  distributed  to 
••he  students  in  the  class  which  invited  them  to  participate  in  the 
rch  project.  For  participating  in  the  first  midquarter  adaptive  test, 
ipants  received  one  point  which  was  to  be  added  to  their  course  grade; 
>r  participating  in  the  second  midquarter  adaptive  test,  they  received 
' points.  During  fall  quarter  394  students  participated  in  the  first  mid- 
quarter adaptive  testing  and  386  participated  in  the  second  midquarter 
adaptive  testing;  during  winter  quarter  the  corresponding  numbers  were  317 
and  349,  respectively. 

Procedure 


For  both  the  first  and  second  midquarter  administrations,  the  volunteer 
students  were  given  three  tests  in  the  following  order:  1)  an  adaptive  verbal 
ability  test,  2)  the  multiple-choice  adaptive  biology  test  based  on  the 
content  covered  in  the  classroom  midquarter  examinations,  and  3)  a test  con- 
sisting of  specially  designed  biology  items.  In  the  present  report,  only  the 
data  from  the  adaptive  biology  tests  were  analyzed. 

The  three  tests  were  administered  by  means  of  cathode  ray  terminals 
(CRT)  connected  to  a Hewlett-Packard  real-time  computer  system.  Instruc- 
tional screens  explaining  the  operation  of  the  equipment  were  presented  prior 
to  testing  (DeWltt  & Weiss,  1974).  A proctor  was  present  in  the  testing 
room  at  all  times  to  assist  students  with  the  equipment.  Each  test  item  was 
presented  separately  at  the  rate  of  960  characters  per  second  on  the  CRT 
screen.  Students  responded  by  pressing  the  key  corresponding  to  the  chosen 
alternative.  During  the  fall  quarter  administration,  feedback  was  provided 
after  each  response,  l.e.,  each  student  was  Informed  whether  or  not  he/she 
had  answered  each  test  item  correctly.  During  the  winter  quarter  administra- 
tion, immediate  feedback  was  not  provided.  There  were  no  time  limits  Imposed 
on  any  of  the  tests.  At  the  completion  of  testing,  students  received  a 
printed  report  which  listed  questions  answered  Incorrectly  and  provided  the 
correct  answers. 


Adaptive  Test 
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Item  pools.  The  develonment  of  the  item  pools  used  in  this  study  has 
been  described  by  Bejar  et  al.,  1977,  The  answer  sheets  for  two  midquarter 
examinations  from  two  previous  academic  quarters  were  used  as  raw  data  for 
obtaining  the  item  parameters  — discrimination  (a),  difficulty  (fc),  and 
guessing  (<?)  — of  the  item  characteristic  curves  for  the  items.  From  the 
fall  quarter  administration  114  items  were  available,  which  covered  the 
contents  of  the  first  classroom  test;  the  pool  for  the  second  test  contained 
112  items.*  From  the  winter  administration  44  items  were  added  to  the  first 
test  pool,  and  49  were  added  to  the  second  test  pool.  There  was  thus  a 
total  of  158  items  in  the  first  test  item  pool  and  161  in  the  second  test 
pool . 


To  construct  item  pools  which  could  be  used  for  administration  of 
stradaptlve  tests  (Vale  & Weiss,  1975a, b;  Weiss,  1973),  each  of  the  two  pools 
was  structured  by  forming  nine  strata  of  Increasing  difficulty.  Mean  stratum 
difficulties  were  chosen  so  that  there  would  be  approximately  the  same  number 
of  items  per  stratum.  Within  each  stratum  the  items  were  ordered  in  terms 
of  their  discriminations  unless  it  resulted  in  items  covering  the  same 
content  area  appearing  consecutively.  Appendix  Tables  A and  B show  the  nine 
strata  into  which  the  first  and  second  test  item  pools  were  structured. 

Effects  of  expanding  the  item  pool.  In  a conventional  test  the  distri- 
bution of  item  parameters  will  determine  the  characteristics  of  scores  derived 
from  that  test.  Similarly,  in  adaptive  testing  the  characteristics  of  the 
items  in  the  item  pool  should  influence  the  characteristics  of  the  scores. 

The  theoretical  research  on  this  question  (Jensema,  1976,  pp,  82-89), 
however,  suggests  that  improving  the  item  pool  has  little  effect  on  precision 
of  measurement. 

The  question  of  improving  the  item  pool  in  adaptive  testing  was  examined 
by  Jensema,  using  a simulation  study  with  Owen's  (1975)  Bayesian  adaptive 
strategy.  Two  kinds  of  pools  were  studied:  one  in  which  a“1.0  and  c7=.25 
for  all  items  and  one  in  which  a“2.0  and  <7=.20.  The  distribution  of  b' s 
within  the  two  pools  was  the  same.  Jensema' s conclusion, that  improving  the  item 
pool  has  no  effect  on  the  accuracy  of  estimating  6,  is  counter-intuitive.  One 
potential  problem  with  Jensema 's  study  is  that  the  dependent  variable  used 
was  the  correlation  of  0 and  0,  which  may  not  be  sufficiently  sensitive  to 
detect  changes  in  precision.  Furthermore,  the  composition  of  the  pools  used 
by  Jensema  were  atypical, since  all  the  items  were  assumed  to  have  the  same 
discrimination.  Consequently,  his  results  lack  generalizablllty. 

The  present  data  permit  a more  realistic  assessment  of  the  effects  of 
item  pool  characteristics  on  the  precision  of  adaptive  test  scores.  Speci- 
fically, the  response  vector  information  functions  computed  for  both  adaptive 
tests  in  the  winter  data  were  based  on  an  enlarged  version  of  the  fall  item 
pool.  The  items  that  were  added  to  both  pools  consisted  of  those  items 
administered  in  the  fall  classroom  test  for  which  it  was  possible  to  obtain 
item  parameter  estimates. 


*BeJar  et  al.  (1977)  reported  that  the  second  midquarter  item  pool  contained 
123  items;  the  112-ltem  pool  resulted  from  the  removal  of  11  items  which  were 
administered  in  a special  format  as  the  third  test. 
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Table  1 shows  the  mean  and  standard  deviation  of  Item  parameter  estimates 
for  both  fall  item  pools  and  the  same  statistics  for  the  items  added  to  form 
the  winter  pool.  For  the  first  test  the  mean  of  the  added  items  was 
somewhat  lower  than  the  items  in  the  fall  pool.  For  the  second  test  the 
added  items  were,  on  the  average,  slightly  more  discriminating.  In  terms  of 
difficulty,  the  added  items  in  the  first  test  pool  were,  on  the  average, 

.10  easier.  In  the  second  test  pool,  the  added  items  were  only  .02  easier. 
Appendix  Tables  A and  B show  that  the  added  items  were  well  distributed  across 
the  nine  strata  of  the  stradaptlve  test  pools.  In  addition,  within  strata, 
the  new  items  were  well  distributed  in  their  order  of  administration.  Average 
stratum  discriminations  were  higher  for  the  improved  (winter)  pool  for  only 
three  of  the  nine  strata  in  the  Test  1 pool  (Table  A)  and  six  of  the  nine 
strata  in  the  Test  2 pool  (Table  B) . In  no  case  were  the  differences  in  mean 
discrimination  very  large. 


Table  1 

Mean  and  Standard  Deviation  of  Item  Parameter  Estimates  for  Fall  Item 
Pool  and  for  Items  Added  to  Winter  Item  Pool  for  Adaptive  Tests  1 and  2 


a 

b 

a 

Test  and  Pool 

Number 

Mean 

S.D. 

Mean 

S.D. 

Mean 

S.D. 

Test  1 

Items  in  Fall 
Item  Pool 

114 

1.21 

.46 

.18 

1.22 

.25 

.09 

Items  Added 
for  Winter 

44 

1.15 

.37 

.08 

1.12 

.30 

.06 

Test  2 

Items  in  Fall 
Item  Pool 

112 

1.20 

.40 

.16 

1.16 

.27 

.09 

Items  Added 
for  Winter 

49 

1.22 

.40 

.14 

1.23 

.29 

.07 

Implementation,  One  of  the  advantages  of  the  stradaptlve  testing 
strategy  is  that  prior  information  can  be  used  to  select  the  stratum  from 
which  the  first  item  is  administered.  In  this  study  the  entry  point  was 
selected  by  the  student;  at  the  beginning  of  each  stradaptlve  test  students 
were  asked  to  state  their  grade-point-average  (GPA)  by  selecting  one  of  nine 
equally-spaced  GPA  intervals  from  2.00  to  A. 00  (DeWitt  & Weiss,  1974,  p.  49). 

On  the  assumption  that  overall  GPA  was  related  to  biology  achievement  levels, 
students  with  the  highest  GPAs  began  the  stradaptlve  test  with  an  item  at  the  most 
difficult  stratum  (Stratum  9),  while  those  with  the  lowest  GPA  began  with  an 
item  at  the  least  difficult  stratum  (Stratum  1). 

A variable  criterion  was  used  to  terminate  testing  on  the  stradaptlve 
test.  After  a student  answered  five  items  in  a stratum,  if  he/she  answered 
20Z  or  fewer  correctly,  testing  was  terminated.  If  testing  was  not  terminated 
by  this  criterion  after  50  items  had  been  administered,  no  further  items 
were  administered. 

The  branching  strategy  used  in  the  stradaptlve  test  was:  1)  if  the 
current  item  was  answered  Incorrectly  or  skipped,  to  administer  the  next 
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unadminlstered  Item  from  the  next  easier  stratum,  or  2)  if  the  current  item 
was  answered  correctly,  to  administer  the  next  unadministered  item  from  the 
next  more  difficult  stratum. 

Conventional  Testa 


Classroom  tests.  The  classroom  examinations  each  quarter  included  55 
items,  which  the  course  staff  selected  by  a combination  of  pedagogical 
criteria  and  procedures  from  traditional  test  theory.  Their  aim  in 
constructing  these  tests  was  to  produce  a "good"  test  fc.'  purposes  of  course 
grading.  Students  were  instructed  to  answer  50  items  of  their  choice.  For 
purposes  of  this  research,  however,  the  tests  were  shorter  than  50  items,  since 
item  parameter  estimates  were  not  available  for  some  of  the  items. 

The  item  parameter  estimates  for  the  items  in  Fall  Tests  1 and  2 (FI  and 
F2)  are  in  Appendix  Table  C;  those  for  Winter  Tests  1 and  2 (W1  and  W2) 
are  in  Appendix  Table  D. 

Improved  tests.  A major  problem  in  comparing  testing  proc.-iures  is  that 
their  inherently  dissimilar  characteristics  frequently  make  equitable 
comparisons  difficult.  The  problem  can  be  alleviated  by  allowing  e..ch  strategy 
to  function  optimally  while  equating  the  testing  procedures  on  relevant 
characteristics.  The  classroom  exams  were  not  expected  to  be  psychometrically 
optimal;  therefore,  it  was  necessary  to  compare  the  stradaptlve  tests  with 
an  improved  conventional  test  drawn  from  the  same  item  pool.  The  winter 
item  pool  contained  all  the  items  available;  therefore,  only  the 
winter  data  were  used  in  the  construction  of  the  improved  conventional 
tests. 

The  improved  conventional  test  was  designed  to  use  the  most  discriminat- 
ing items  in  the  item  pool  in  order  to  measure  individual  differences  in 
the  range  of  course  achievement  within  which  differential  grades  would  be 
assigned.  That  is,  it  was  assumed  that  below  a given  level  of  "passing"  the 
course,  further  differentiations  among  students  were  unnecessary;  above  that 
level,  it  was  desirable  to  differentiate  as  accurately  as  possible  among  the 
students  in  order  to  assign  differential  grades.  To  permit  a psychometrically 
meaningful  comparison  with  the  adaptive  test,  the  Improved  conventional  tests 
were  also  designed  to  be  equivalent  to  the  adaptive  test  in  terms  of  levels 
of  item  discrimination  and  number  of  items  administered. 

A comparison  of  the  mean  discriminations  for  the  original  winter  quarter 
classroom  tests  with  the  Item  pools  used  for  the  stradaptlve  test  showed  that 
the  mean  for  the  stradaptlve  pool  was  a“1.19  for  the  W1  item  pool  and  a*1.21 
for  the  W2  item  pool.  Mean  discriminations  for  the  winter  classroom  tests 
were  1.09  and  1.14,  respectively.  The  comparison  between  the  Item  discrimina- 
tions of  the  two  testing  strategies  Is  complicated,  however,  by  the  way  Items 
are  selected  for  administration  In  the  stradaptlve  test.  Since  the  items  in 
each  stratum  in  the  stradaptlve  pool  were  ordered  by  their  discriminations 
and  the  branching  strategy  Is  designed  to  administer  the  earlier  items  in  the 
strata  first,  the  mean  discrimination  of  the  stradaptlve  Item  pools  will  be 
lower  than  the  mean  discrimination  of  items  administered  in  most  stradaptlve 
tests. 
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To  provide  a fair  comparison  between  the  adaptive  and  the  conventional 
tests.  It  would  be  necessary  to  construct  a conventional  test  "matching"  the 
Item  discriminations  In  the  adaptive  test.  This  la  difficult  to  Implement, 
however,  since  the  discriminations  In  each  administration  of  the  adaptive 
test  will  differ.  Instead,  the  Improved  conventional  tests  were  designed  to 
provide  a comparison  which  would  not  favor  the  adaptive  test  In  terms  of 
mean  Item  discrimination. 

The  Improved  conventional  tests  for  each  of  the  two  midquarters  were 
constructed  by  selecting  the  Items  which  appeared  first  In  the  strata  of  the 
stradaptlve  pool;  these  were  generally  the  most  discriminating  Items  In  the 
strata.  The  number  of  Items  selected  was  based  on  the  overall  mean  test 
length  for  the  stradaptlve  test.  The  Items  which  were  ordered  first  In  the 
top  seven  strata  of  the  stradaptlve  tests  were  selected  to  constitute  the 
Improved  conventional  tests.  Only  seven  strata  were  used  rather  than  nine, 
so  that  the  Improved  conventional  test  would  be  somewhat  peaked.  Its 
precision  would  thus  be  concentrated  In  the  range  of  achievement  most  relevant 
for  Instructional  decisions.  The  Improved  conventional  tests  consisted  of 
24  Items  each  for  both  the  first  and  second  tests  administered  in  the  winter 
quarter;  they  were  based  on  a stradaptlve  test  with  a maximum  of  30  Items 
which  had  a mean  test  length  of  approximately  24  Items. 

The  Item  parameters  for  the  Items  constituting  the  two  Improved  conven- 
tional tests  are  shown  In  Appendix  Table  E.  The  first  21  Items  comprise  the 
first  three  Items  In  Strata  3 through  9 for  both  tests.  In  the  Improved 
conventional  tests  the  last  four  Items  were  the  fourth  Items  In  Strata  7 
through  9.  These  items  had  mean  discrimination  values  of  1.73  and  1.76, 
respectively,  for  the  two  midquarter  examinations;  for  the  stradaptlve  pools 
the  mean  discriminations  were  1.19  and  1.21. 

Because  of  the  way  the  stradaptlve  item  pool  is  structured  and  the  way 
stradaptlve  test  Items  are  selected,  the  mean  discrimination  of  the  Improved 
conventional  test  would  be  equal  to  or  greater  than  that  of  any  stradaptlve 
test.  The  mean  discrimination  of  the  two  testing  procedures  would  be  equal 
solely  for  a testee  whose  stradaptlve  test  response  record  Included  only  the 
items  in  the  Improved  conventional  test.  For  any  testee  whose  responses  on 
the  stradaptlve  test  required  administration  of  Items  farther  down  the  strata 
than  those  used  by  the  Improved  conventional  test,  the  mean  discrimination 
would  be  lower  than  that  of  the  conventional  test.  Since  the  majority  of 
stradaptlve  response  records  utilize  Items  beyond  the  third  item  In  the  strata, 
the  stradaptlve  tests  generally  would  use  Items  of  lower  average  discrimination 
than  would  the  Improved  conventional  test. 

Saorina 

All  tests  were  scored  by  maximum  likelihood  estimation,  specifying 
Blrnbaum's  (1968)  three-parameter  logistic  model  as  the  response  model.  The 
item  parameter  estimates  were  edited  by  the  scoring  program  so  that  the 
maximum  value  of  the  discrimination  parameter  (a)  was  set  to  2.5,  the  maximum 
absolute  value  of  the  difficulty  parameter  (b)  was  set  to  3.00,  and  the 
maximum  value  of  the  guessing  parameter  (<?)  was  set  to  .33.  In  estimating 
achievement  levels,  omitted  Items  were  not  scored  as  Incorrect;  they  were 
merely  Ignored. 


rn  formation 
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Definitiona.  Equation  1 gives  the  teat  information  function  of  a test 
consisting  of  n Items  In  relation  to  the  logarithm  of  the  likelihood  function 
of  response  pattern  V (see  Samejlma,  1969): 


/(6) 


^Hog  Ly(9) 


[11 


where  ^^(9)  is  the  likelihood  function,  and 

V is  the  pattern  of  correct/incorrect  responses  to  a set  of  test  items. 
That  is.  Information  is  the  (negative)  expected  value  of  the  second  deriva- 
tive of  the  log  likelihood  function.  "Psychometric  Information,"  defined  in 
this  way,  is  identical  to  Fisher's  concept  of  information  (cf.  Edwards,  1971). 

In  this  study  the  comparison  between  the  conventional  and  adaptive  tests 
was  based  on  observed  Information  functions.  These  were  computed  from  the 
Item  responses  given  by  each  testee.  The  observed  value  of  Information,  as 
opposed  to  the  expected  value,  is  the  value  of  the  second  derivative  of  the 
log-likelihood  function  at  a testae's  estimated  value  of  0.  That  is. 


1(9) 


^^log  Ly(e) 


30' 


0-§ 


[2] 


Equation  2 defines  the  response  vector  counterpart  of  Samejima's  (1969,  Ch.  6) 
item  response  information  function  which  she  has  called  the  response  pattern 
information  function  (Samejlma,  1973). 


^ A. 


For  the  3-parameter-logistlc  model,  J(0)  is  given  by 


7(0) 


‘L-r 


2 Xn 

at  e 9 
i. 
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where  Z)  * 1.7 

a^  » the  estimate  of  the  discriminating  power  of  the  item 
a “ the  estimate  of  the  lower  asymptote  of  the  item  characteristic  curve 

/ - %(e-fc^) 

“ the  estimate  of  the  difficulty  of  the  item 

)1  if  item  is  answered  correctly 
p |0  if  item  is  answered  incorrectly. 


It  is  clear  from  Equation  3 that  for  a single  item,  i"(§)  takes  one  of  two 
values,  namely 
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[4] 


-9- 


D^a^e^g 
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F.quatlon  k occurs  with  probability  P - P (0)  ■ c + (l-o  ) , while 


2 


<7 


Equation  5 occurs  with  probability  Q - " 1 - P^(Q) . Thus  the  expected 

value  of  7 (0)  (l.e.,  7 (0)),  Is 

a 


? (0) 
2 


(1-P)  — 2__  + p 


[ 


1+e' 


.X^]2 


D^a^e^g 


D^a^a  e^g 
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which  Is  the  usual  item  Information  function  evaluated  at  0 (see  Blrnbaum, 
1968,  Eq.  20.4.20).  The  sum  of  the  7 (§)  across  all  Items  administered  In 

^ ^ g 

a test  at  a given  value  of  0 Is  7(0),  which  Is  the  theoretical  test  Informa- 
tion function  based  on  estimated  values  of  0.  Brown  and  Weiss  (1977)used  the 
evaluation  and  summation  of  Item  response  Information  functions  by  Equation  6 
at  an  estimated  value  of  0 in  their  live-data  simulation  study  to  obtain 
estimated  information  curves;  their  9's,  however,  were  based  on  a Bayesian 
scoring  routine. 

A A /N 

Both  -i(0)  and  7(0)  depend  on  the  Item  parameter  estimates  a,  b,  and  o. 
However,  7(9)  Is  one  step  further  removed  from  the  data,  since  It  does  not 
allow  the  observed  response  pattern  of  correct  and  Incorrect  response  to 
dictate  Its  value,  whereas  7(0)  does.  In  theory  ?(6)  may  be  considered  an 
estimate  of  7(0),  which  Is  easily  obtained  during  the  estimation  of  0 by  the 
Newton-Raphson  procedure,  requiring  both  the  first  and  second  derivative  of 
the  log-likelihood  function.  The  value  of  the  second  derivative  of  the 
log-llkellhood  function  at  the  last  Iteration  Is  7(0). 

Computation.  Using  the  maximum  likelihood  scores  computed  for  each 
testee  on  the  conventional  and  adaptive  tests.  Information  was  computed 
for  each  testee  during  the  scoring  process  by  evaluating  the  second  derivative 
of  the  log-llkellhood  function  at  the  final  estimated  value  of  0,  based  on 
test  Items  actually  administered.  The  response  vector  Information  curves 
for  a given  testing  strategy  were  then  obtained  by  grouping  students  on  their 
estimated  achievement  (0)  In  Intervals  of  .20  from  -2.00  to  4-2.00.  The  mean 
response  vector  Information  for  students  within  a given  Interval  of  6 was 
assigned  to  the  midpoint  of  that  Interval.  All  Information  values  presented 
below  have  been  multiplied  by  1/2.89. 

Compariaon.  No  studies  have  been  renorted  which  utilized  the  Item 
response  pattern  information  function  [7(e)]  computed  from  live-testing  data; 
therefore, It  was  appropriate  to  compare  the  results  of  computing  Information 
by  this  method  with  the  Information  curves  derived  from  the  sum  of  the 
item  Information  functions.  Tl.e  computation  of  test  Information  curves  from 


Figure  1 

^ Observed  and  Theoretical  Test  Information  Functions  for  Test  FI 


Figure  2 

Observed  and  Theoretical  Test  Information  Functions  for  Test  F2 
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the sum  of  Item  information  curves  assumes  that  real  testees  respond  to  Items 
in  the  test  In  accordance  with  the  item  characteristic  curve  (ICC)  model. 

On  the  other  hand,  computing  information  curves  using  Equation  3 from  the 
item  response  pattern  of  real  testees  will  likely  include  some  error,  since 
all  testees  do  not  respond  strictly  in  accordance  with  the  model.  A compar- 
ison of  the  two  Information  curves  derived  from  the  same  set  of  item  responses 
was, therefore, useful  to  evaluate  the  applied  usefulness  of  response  pattern 
Information  functions,  as  well  as  to  Indicate  whether  or  not  the  responses  of 
the  students  to  this  achievement  test  were  widely  discrepant  from  the  ICC 
model . 


Consequently,  test  information  curves  were  computed  from  the  sum  of  the 
item  information  functions  (Equation  6)  and  from  the  response  pattern  informa- 
tion functions  (Equation  3),  using  the  responses  to  the  conventional  test  for 
each  of  the  four  midquarter  examinations. 


Reeulta 


Test  Tnfomation  Vereue  Reaponee  Pattern  Information 

Figures  1 through  4 show  for  the  four  classroom  biology  examinations 
the  test  information  curves  computed  from  1)  the  sum  of  the  item  information, 

l.e.,  the  theoretical  test  information  function  [1(0)1;  and  2)  the  response 
pattern  information  curves,  l.e.,  the  observed  test  information  function 
[?(0)].  Data  for  the  test  Information  functions  are  in  Appendix  Table  F; 
data  for  the  response  pattern  information  functions  are  in  Tables  2 and  3. 

The  data  for  fall  Quarter  (Figures  1 and  2)  show  that  item  response 
pattern  information  [?(0)]  consistently  underestimated  the  theoretical 
curve  derived  by  summing  the  item  information  functions  [!(§)].  The 
difference  was  fairly  consistent  throughout  the  0 range,  although  for  the 
first  test  (FI),  the  discrepancy  diminished  at  the  lower  end  of  the  0 
continuum,  where  0 £ -1.40.  For  both  sets  of  data  the  largest  differences 
appeared  to  be  at  the  point  of  highest  information;  the  magnitude  of 
differences  decreased  with  decreasing  information  levels. 

The  winter  data  (Figures  3 and  4)  exhibited  the  game  general  pattern  of 
results.  It  can  be  seen  from  Figures  3 and  4 that  T(6)  again  underestimated 
the  value  of  the  theoretical  test  information  function.  In  the  first  test 
(Wl)  there  was  a marked  decrease  in  the  discrepancy  between  the  two  curves 
for  those  values  of  0 less  than  about  -1.25;  in  the  second  test  (W2)  data  the 
two  curves  were  closest  together  at  values  of  0 less  than  1.50.  The  winter 
data,  however,  did  not  fully  support  the  tendency  for  the  two  curves  to  be 
farthest  apart  at  the  point  of  highest  information;  this  tendency  occurred 
in  the  W2  data,  but  not  the  Wl  data. 

There  were  thus  three  trends  common  across  all  four  examinations: 

1.  The  observed  (response  pattern)  curve  was  always  an  underestimate 
of  the  theoretical  (test)  information  curve; 

2.  The  differences  between  the  two  information  curves  tended  to 
diminish,  and  in  some  cases  disappear,  at  low  levels  of  6;  and 

3.  There  was  a fairly  constant  difference  between  the  two  information 
curves  throughout  the  range  of  0 above  -1.00. 
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Adaptive  Test  Versus  CZasaroom  Teat 

First  tests.  Table  2 shows  the  values  of  observed  Information 
(response  pattern  Information)  from  the  first  classroom  and  adaptive  tests 
for  fall  (FI)  and  winter  (Wl).  The  results  are  plotted  In  Figures  5 and 
6,  which  show  that  for  both  fall  and  winter  the  adaptive  test  yielded  a 
substantially^higher  amount  of  Information  at  all  levels  of  achievement 
greater  than  0--1.5.  Because  the  adaptive  test  was  shorter,  on  the  average, 
this  Is  particularly  significant. 

Figure  5 

Observed  Information  Functions  for  FI  Classroom 


As  previously  indicated,  the  number  of  Items  was  not  fixed  for  the 
adaptive  test.  Although  the  maximum  test  length  for  the  adaptive  test  was 
50  Items,  Table  2 shows  that  on  the  average  students  were  terminated 
after  27.2  Items  In  Test  FI  and  ffter  31.6  Items  In  Test  Wl.  Excluding 
students  at  the  extremes  of  the  0 distribution  (where  the  stradaptive  test 
would  tend  to  terminate  prematurely  because  suitable  items  were  not  available 
In  the  pool),  the  range  of  mean  number  of  Items  to  termination  on  the 
stradaptive  test  was  18.9  to  32.5  for  Test  FI  and  25.9  to  41.1  for  Test  Wl. 

On  the  other  hand,  the  mean  Information  values  for  the  classroom  test  were 
based  on  an  average  of  35  Items  for  Test  FI  and  40.5  Items  for  '*'est  Wl. 
(Although  the  actual  classroom  test  was  50  Items  long,  there  were  Items 
for  which  no  Item  parameter  estimates  were  available.) 

Thus,  even  though  the  adaptive  test  on  the  average  was  about  eight 
Items  shorter,  lj[  yielded  a much  more  precise  estimate  of  achievement. 

For  example,  at  0*.7^(and  .9)  the  FI  classroom  test  had  maximum  Information 
of  2.90,  whereas  at  0*.7  the  adaptive  test  had  maximum  Information  of  5.07. 
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Flgure  6 

Observed  Information  Functions  for  W1  Classroom 
and  Adaptive  Tests 


7.2H 


— T 1 1 1 1 1 1 1 r- 

-2.0  -1.5  -1.0  -.5  0.0  .5  1.0  1.5  2.0 

Estimated  Achievement  Level  (§) 


The  ratio  of  Information  values  at  0*.7  was  1.75.  This  means  that  for  the 
conventional  test  to  be  equal  In  precision  to  the  adaptive  test  at  that 
level  of  0,  It  would  have  to  be  Increased  In  length  by  about  75/S.  This 
would  result  In  a conventional  test  of  61  Items  In  order  to  achieve  the 
same  quality  of  measurement  as  a stradaptlve  test  with  a mean  of  31.3  items. 

/K  The  stradaptlve  test  achieved  Its  highest  level  of  information  (5.77) 
at  6*1. 1 with  an  ayerage  of  29.1  Items;  the  Information  provided  by  the 
classroom  test^at  6>1.1  was  2.63.  The  ratio  of  2.19  Indicates  that  at 
this  level  of  Q the  classroom  test  would  require  77  Items  to  measure  as 
well  as  the  29.1-ltem  average  adaptive  test. 

Similar  results  were  observed  for  the  W1  data.  At  the  point  where 
the  classroom  test  was  most  Informative,  0--1.3,  the  adaptive  test  was 
more  informative  by  a factor  of  4. 64/3. 55~1. 31  with,  on  the  average,  6.4 
fewer  items.  Thus,  at  that  level  of  6 the  classroom  test  would  require 
53  Items  to  measure  as  precisely  as  the  average  34-ltem  stradaptlve  test. 

At  the  point  where  the  adaptive  test  was  most  informative,  G^l.l,  the 
improvement  factor  was  6.74/2.12-3.18.  The  classroom  test, therefore , 
would  require  129  items  to  measure  as  precisely  as  the  37-ltem  adaptive 
test.  Thus,  even  when  comparisons  were  made  at  the  point  of  maximum 
information  for  the  classroom  test,  the  adaptive  test  was  more  efficient 
in  terms  of  information  per  item.  When  the  comparison  was  made  at  the 
point  of  maximum  information  for  the  adaptive  test,  the  discrepancy  in 
efficiency  for  the  two  testing  procedures  was  even  greater. 
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Seaond  tests.  Table  3 shows  the  number  of  testees,  the  mean  number 
of  Items,  and  the  mean  Information  as  a function  of  0 for  the  second 
classroom  and  adaptive  tests  administered  during  fall  {F2)  and  winter 
(W2).  Estimated  Information  curves  are  plotted  for  these  tests  In 
Figures  7 and  8.  Figure  7 shows  that  for  the  F2  data,  the  adaptive 
test  was  generally  superior  to  the  classroom  test.  In  the  Interval  from 
0“-. 50  to  0-.2O,  however,  the  classroom  test  yielded  higher  levels  of 
Information. 

Figure  7 

Observed  Information  Functions  for  F2  Classroom 


Figure  8 shows  the  results  for  U2.  For  all  6 values  greater  than 
-1.5, the  Information  provided  by  the  adaptive  test  was  substantially 
higher  than  that  of  the  classroom  test;  this  was  similar  to  the  findings 
for  FI  and  Wl.  The  adaptive  test  thus  provided  better  measurement  through- 
out the  0 range  In  three  of  the  four  tests. 

There  are  two  explanations  for  the  adaptive  test  providing  less 
Information  than  the  conventional  test  for  the  F2  data  In  a narrow  range 
around  the  mean  of  the  6 distribution.  First,  as  Appendix  Table  C shows, 
the  F2  classroom  test  was  a considerably  more  peaked  test  than  the  FI, 

Wl,  and  U2  classroom  tests.  Peaked  tests  tend  to  have  peaked  Information 
functions  (Lord,  1970),  since  they  concentrate  all  their  measurement 
efficiency  near  one  point  on  the  6 continuum. 

A note  Important  explanation,  however.  Is  seen  In  Table  3.  In  the 
range  of  §--.5  to  .10,  the  mean  adaptive  test  length  was  substantially 
below  the  mean  clasfroom  test  length.  Dividing  the  Information  at  each 
of  these  values  of  6 by  their  corresponding  test  length  Indicates  that 
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Figure  8 

Observed  Information  Functions  for  W2  Classroom 
and  Adaptive  Tests 
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the  mean  information  per  item  was  higjjer  for  the  adaptive  test  than  for 
the  classroom  test.  For  example,  at  0“-.5  the  mean  information  per  item 
was  .11  for  the  adaptive  test  and  .09  for  the  classroom  test.  Thus,  while 
observed  mean  information  was  lower  for  the  adaptive  F2  data,  this  was 
merely  an  artifact  and  was  attributable  to  the  termination  rule  employed 
in  the  test,  which  resulted  in  very  short  tests  in  the  9 range  of  -.5  to 
.20. 


Surniax^.  The  results  from  both  test  administrations  show  that  when 
d4fferences  in  test  length  were  taken  into  account,  the  adaptive  tests 
yielded  substantially  more  precise  estimates  of  achievement  than  any  of 
the  conventional  tests  at  all  levels  of  achievement.  The  results, 
summarized  in  Table  4,  were  equally  favorable  to  adaptive  testing  when 
all  6 levels  were  combined.  As  shown  in  Table  4,  the  information  across 
levels  of  6 for  the  FI  data  was  4.53  for  the  adaptive  test  and  2.36  for 
the  classroom  test  with  test  lengths  of  27.2  and  35.0  items,  respectively. 
The  information  ratio  of  1.92  in  favor  of  the  adaptive  test  implies  that 
the  classroom  test  would  require  67  items  in  order  to  measure  as  precisely 
as  the  average  27-item  adaptive  test.  The  results  for  the  other  three 
tests  (Wl,  F2,  and  W2)  also  showed  the  overall  superiority  of  the  adaptive 
test  while  reducing  test  length.  The  smallest  improvement  was  for  the  F2 
data;  the  ratio  of  mean  information  for  the  F2  test  was  1.24  in  favor  of 
the  adaptive  test,  implying  that  the  conventional  test  would  require  46 
items  to  measure  as  well  as  an  average  32-item  adaptive  test.  This 
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Table  4 

Mean  Information  and  Mean  Test  Length  for  Fall 
and  Winter  Adaptive  and  Classroom  Tests 


Test 

Mean 

Information 

Mean 

Test 

Length 

Adap. 

Class. 

Ratlo^ 

Adap. 

Class 

1.  Difference^ 

FI 

4.53 

2.36 

1.92 

27.2 

35.0 

7.8 

W1 

5.28 

2.89 

1.83 

31.6 

40.5 

8.9 

F2 

3.79 

3.06 

1.24 

31.7 

37.3 

5.6 

W2 

4.64 

3.03 

1.53 

32.0 

40.3 

8.3 

Adaptive  divided  by  Classroom 


Classroom  minus  Adaptive 


represents  a reduction  of  30%  In  classroom  test  length  attributable  to 
adaptive  teijtlng. while  achieving  equivalent  average  precision  with  the 
peaked  classroom  test. 

Adaptive  Versue  Improved  Conventional  Test 

Test  VI.  Since  the  Improved  conventional  test  was  not  actually 
administered.  It  was  not  possible  to  compute  Its  response  vector  Information 
function.  Instead,  mean  values  of  the  test  (theoretical)  Information 
function  were  computed  at  20  levels  of  9 using  Equation  6.  The  obtained 
values  are  in  Appendix  Table  G,  which  also  shows  the  mean  values  of 
response  pattern  information  for  the  adaptive  W1  test,  rescored  using 
maximum  test  lengths  of  40,  30,  and  20  Items.  Based  on  the  data  in 
Table  G,  Figure  9 shows  the  corresponding  response  pattern  information 
curves  for  the  stradaptlve  test  at  20-  and  30-ltem  maximum  lengths  and  the 
test  information  curves  for  the  Improved  conventional  test. 

As  Figure  9 shows,  test  Information  for  the  improved  conventional  test 
was  very  low  for  the  low  levels  of  achievement.  Since  there  were  no  Items 
In  this  test  with  difficulties  less  than  fc“-.65,  this  was  to  be  expected. 

The  significant  comparison  between  the  two  testing  strategies  Is  for  0 
values  greater  than  approximately  -.40,  as  Indicated  by  the  vertical  dashed 
line  In  Figure  9.  Within  this  range,  both  the  adaptive  tests  had  maximum 
Information  at  §"1.10,  while  the  Information  curve  for  the  Improved  conventional 
test  was  almost  at  Its  peak.  The  mean  response  vector  Information  for  the 
20-ltem  maximum  length  adaptive  test  at  §"1.10  was  4.78;  the  corresponding 
value  of  information  for  the  improved  conventional  test  was  4.52.  This 
represents  a 6%  Increase  In  Information,  with  an  average  decrease  of  5 Items. 

A more  significant  comparison  can  be  made  with  the  30-ltem  maximum  adaptive 
test,  since,  on  the  average.  It  was  24  Items  long  and  therefore  the  same 
length  as  the  Improved  conventional  test.  Throughout  the  range  of  §,  as 
well  as  In  the  range  in  which  the  Improved  conventional  test  was  designed  to 
function  optimally,  the  value  of  response  vector  Information  for  the  30-ltem 
maximum  adaptive  test  was  substantially  higher  than  that  for  the  Improved 
conventional  test  (see  Figure  9).  Specifically,  at  §"1.3»  where  the  Improved 
conventional  test  had  the  highest  Information,  the  30-ltem  maximum  adaptive 
test  had  at  least  7%  more  Information.  At  that  specific  value  of  §,  the 
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Flgure  9 


Mean  Information  as  a Function  of  Estimated  Achievement 
Level  for  Improved  Conventional  Test  and  Adaptive  Test 


mean  test  length  for  the  stradaptlve  test  was  22.8  Items,  or  1.2  Items  shorter 
than  the  Improved  conventional  test.  The  Improved  conventional  test  would 
then  require  25.7  items  In  order  to  measure  as  precisely  as  the  average 
22.8-ltem  adaptive  test.  Thus,  with  test  length  and  average  Item  discriminations 
equal,  the  adaptive  process  resulted  In  measurement  of  higher  precision. 

Teet  W2.  Appendix  Table  H shows  the  values  of  the  theoretical  test 
Information  functions  for  the  improved  conventional  test,  as  well  as  the 
mean  values  of  response  vector  Information  for  the  adaptive  test  rescored 
with  maximum  lengths  of  40,  30,  and  20  Items.  The  Information  curves  based 
on  these  data  for  the  conventional  test  and  for  the  20-  and  30-ltem  adaptive 
test  are  plotted  In  Figure  10. 

The  information  for  the  Improved  conventional  test  was  again  very  low 
for  6<-1.00  (see  Figure  10),  because  of  the  way  In  which  Items  were  selected; 
the  lowest  difficulty  level  for  an  Item  In  the  conventional  test  was  ii“-.61. 

For  § values  In  the  range  providing  an  equitable  comparison  of  the  two  testing 
procedures,  the  information  values  for  the  Improved  conventional  test  were 
higher  than  those  for  the  adaptive  test  with  a maximum  length  of  20  items, 
for  0>.2O.  However,  the  mean  number  of  Items  for  the  adaptive  test  at  these 
levels  of  § was  always  less  than  20,  or  four  to  eight  Items  shorter  than 
that  of  the  Improved  conventional  test. 
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Flgure  10 

Mean  Information  as  a Function  of  Estimated  Achievement 
level  for  Improved  Conventional  Test  and  Adaptive  Test 


Estimated  Achievement  Level  (§) 


The  comparison  of  the  Information  curves  for  the  24-ltem  conventional 
test  with  that  of  the  maximum  30-ltem  adaptive  test  provided  a comparison 
of  the  two  testing  procedures  which  is  equated  for  mean  number  of  items, 
since  under  these  conditions  an  average  of  24  Items  was  administered  In 
the  adaptive  test.  In  the  relevant  range  of  §,  the  adaptive  test  provided 
generally  higher  levels  of  Information,  except  at  Q“.3  and  § “1.1,  where 
information  provided  by  the  conventional  test  was  slightly  higher,  and  at 
6“1.3,where  both  testing  procedures  provided  equal  levels  of  Information 
(see  Figure  10).  The  adaptive  test  administered  two  fewer  Items,  on  the 
average,  at  §-.3  than  the  improved  conventional  test;  at  the  other  two 
values  of  § the  number  of  items  administered  was  the  same. 

Surmary.  The  comparisons  between  the  improved  conventional  tests  and 
the  adaptive  tests  showed  that  1)  Improved  adaptive  tests  provided  higher 
levels  of  Information  with  fewer  Items  than  the  conventional  test  and 
2)  adaptive  tests  provided  generally  higher  levels  of  information  with 
approximately  the  same  mean  number  of  Items.  The  comparisons  were  based 
on  tests  with  comparable  values  of  item  discriminations,  although  the 
discriminations  In  Che  Improved  conventional  test  were  generally  higher  than 
the  mean  discriminations  of  the  adaptive  tests.  One  additional  factor 
further  influenced  these  comparisons  In  favor  of  the  conventional  test: 


data  on  which  the  comparisons  were  based  were  the  theoretical  Information 
functions  for  the  conventional  test  and  the  observed  (response  vector) 
Information  functions  for  the  adaptive  test.  As  Figures  1 to  4 show, 
the  theoretical  Information  values  consistently  over-estimated  the  observed 
Information  values.  Thus,  the  Information  values  for  the  conventional  tests 
are  probably  higher  than  they  would  be  had  they  been  computed  from  the 
response  vectors  of  actual  testees.  As  a result.  It  can  be  concluded  that 
when  adaptive  and  conventional  tests  are  matched  In  terms  of  test  length 
and  average  Item  discriminations,  the  adaptive  test  results  In  consistently 
higher  levels  of  Information.  The  Improvement  In  precision  resulting 
from  adaptive  testing  Is  a function  of  the  process  of  selecting  test  Items 
appropriately  matched  to  the  testee's  estimated  level  of  achievement. 

Effect  of  Expandina  the  Item  Pool 


First  teats.  The  response  vector  Information  curves  for  Tests  FI 
and  W1  are  In  Figures  1 and  3,  respectively;  mean  Information  values  are 
In  Table  2.  As  Table  2 shows,  however,  mean  test  lengths,  as  well  as 
mean  Information  for  the  two  tests  differed.  Both  mean  test  length  and 
mean  Information  were  higher  for  the  W1  tests  which  utilized  the  enlarged 
Item  pool.  Consequently,  a direct  comparison  between  the  two  Information 
curves  would  be  confounded  by  test  length. 


Figure  11 

Mean  Information  Divided  by  Mean  Number  of  Items 
for  Fall  and  Winter  Adaptive  Tests  Using 
Test  1 Item  Pools 
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To  determine  whether  or  not  there  had  been  an  Improvement  In  information 
beyond  that  attributable  to  increased  test  length,  the  mean  response  pattern 
information  at  each  level  of  0 was  divided  by  the  corresponding  mean  test 
length.  These  data  are  shown  in  Appendix  Table  I and  the  resulting  curves 
are  plotted  in  Figure  11.  The  two  curves  differed  very  little  until  the 
point  at  which  0-.1O.  Thereafter  and  until  the  point  at  which  S“.70,  the 
winter  data  provided  slightly  more  Information.  After  this  point  the  winter 
pool  failed  to  provide  levels  of  information  as  high  as  those  of  the  fall 
pool.  In  terms  of  overall  information,  however,  there  was  no  Increase  In 
mean  information  from  fall  to  winter. 

Second  teats.  Mean  values  of  response  vector  Information  for  the 
fall  and  winter  are  shown  in  Table  3,  and  information  curves  are  plotted 
in  Figures  2 and  4.  Figure  12  shows  the  two  information  curves  equated 
for  mean  test  length  at  each  Interval  of  §;  numerical  values  are  In 
Appendix  Table  I. 


Figure  12 

Mean  Information  Divided  by  Mean  Number  of  Items 
for  Fall  and  Winter  Adaptive  Tests  Using 


The  winter  pool  provided  higher  levels  of  information  than  the 
fall  pool  at  almost  levels  of  6 (see  Figure  12).  The  differences  were 
particularly  large  in  the  Interval  §“-.5  to  8-1.10.  The  mean  response 
vector  information  values  equated  for  test  length  across  all  levels  of 
0 for  fall  and  winter  were  .12  and  .15,  respectively;  their  ratio  was 
1.25,  which  represented  a 253t  increase  in  information  attributable  to  the 
expanded  item  pool  with  test  length  held  constant. 
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£!unirk7rj^.  These  results  show  the  expected  outcome.  That  Is,  the 
Improvement  In  precision  of  measurement  as  a function  of  enlarplng  the  Item 
pool  depends  on  the  nature  of  the  Items  added  to  the  pool.  For  the  first 
tests,  the  additional  Items  were  slightly  less  discriminating  than  the  Items 
already  In  the  pool;  therefore,  using  the  enlarged  winter  quarter  pool  did 
not  provide  precision  of  measurement  which  was  appreciably  better.  For 
the  second  tests,  however,  the  Items  added  to  form  the  winter  pool  were, 
on  the  average,  slightly  more  discriminating  than  the  Items  already  In  the 
pool.  Scores  derived  from  the  enlarged  winter  Item  pool  were  thus  more 
precise  than  those  from  fall. 

Swmary  and  Conclusions 

This  report  compared  the  Information  provided  by  typical  classroom 
achievement  tests  and  Improved  conventional  tests  with  levels  of  Information 
provided  by  adaptive  achievement  tests  measuring  the  same  course  material. 

The  evaluation  criterion  was  response  pattern  Information,  a measure  of 
information  which  can  be  used  with  data  obtained  from  live  test  administration. 
A comparison  of  results  from  the  computation  of  response  pattern  Information 
with  theoretical  test  information  indicated  that  the  response  pattern 
Information  levels  were  consistently  lower  than  the  test  Information  levels 
for  a given  set  of  items.  Presumably,  this  was  because  testees  were  not 
responding  exactly  as  predicted  by  the  Item  characteristic  curve  model. 

However,  the  shapes  of  the  Information  curves  provided  by  the  two  methods 
of  computing  Information  were  very  similar.  This  suggests  that  response 
pattern  Information  Is  useful  as  a substitute  for  the  theoretical  test 
Information  function;  It  Is  easily  computed  as  part  of  the  maximum 
likelihood  scoring  procedure, and  It  reflects  the  characteristics  of  live 
testing  data  (a  characteristic  which  Is  useful  In  empirical  research). 

As  expected,  the  adaptive  testing  of  classroom  achievement  yielded 
substantially  more  precise  estimates  of  achievement  than  the  conventional 
classroom  achievement  tests.  This  Improvement  was  evident  In  several 
tests;  It  was  reflected  globally,  as  well  as  at  all  levels  of  achievement, 
when  test  length  was  taken  Into  account.  However,  the  results  Indicated 
that  the  degree  of  Improvement  of  the  adaptive  test  over  the  conventional 
classroom  test  depended  upon  the  psychometric  characteristics  of  the 
conventional  test.  For  example,  the  comparison  of  the  FI  classroom  test 
with  the  FI  adaptive  test  showed  a large  advantage  in  favor  of  the  adaptive 
test,  since  the  items  In  the  classroom  test  were  well  distributed  through- 
out the  range  of  achievement.  On  the  other  hand,  at  some  levels  of  6 
the  F2  classroom  test  provided  higher  levels  of  Information  than  the 
stradaptlve  test.  In  terms  of  mean  Information  per  Item,  however,  the 
stradaptlve  test  was  still  superior  to  the  classroom  test. 

This  finding  serves  to  Illustrate  the  possibility  that  within  a 
restricted  range  of  6,  a conventional  test  can  provide  higher  levels  of 
Information  than  an  adaptive  test  unless  certain  precautions  are  taken 
in  the  administration  of  the  adaptive  test.  One  such  precaution  would 
be  not  to  administer  too  few  Items.  That  Is,  In  some  circumstances  the 
termination  rule  used  In  the  stradaptlve  test  should  be  modified  to  Insure 
administration  of  a minimum  number  of  Items.  A better  solution,  however, 
would  be  to  continue  testing  until  a pre-specif led  level  of  Information  Is 
reached  for  every  Individual  (Samejlma,  1977).  A positive  byproduct  of  this 
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solution  would  be  to  Insure  a high  and  horizontal  information  function  for  the 
adaptive  test,  l.e.,  equal  precision  of  measurement  at  all  levels  of  achievement. 

On  the  other  hand,  these  data  also  reflect  the  dilemma  encountered  in 
the  construction  of  fixed-length  conventional  tests.  Such  tests  can  he 
peaked  so  that  the  item  difficulties  are  concentrated  in  a given  region  of  fi; 
the  result  will  he  a test  providing  high  levels  of  information  in  a 
restricted  range  of  0 and  low  levels  elsewhere.  Alternatively,  the  fixed 
number  of  items  can  be  distributed  in  difficulty  over  the  range  ot  P 
(as  in  the  FI  test  used  in  this  study);  the  result  is  a horizontal,  hut 
low,  information  function.  The  test  constructor  cannot  construct  a 
conventional  achievement  test  with  an  information  function  which  is  both 
high  and  flat,  unless  an  inordinate  number  of  test  items  is  administered. 

Adaptive  testing,  however,  provides  a ready  solution  to  this  problem, 

which  is  confronted  whenever  there  is  considerable  variability  among  students 

in  degrees  of  achievement  resulting  from  instruction. 

Because  the  classroom  tests  had  not  been  constructed  to  be  psvcho- 
metrlcally  optimal,  the  information  provided  by  the  stradaptlve  tests 
was  compared  to  that  provided  by  Improved  conventional  tests  which  were 
derived  from  the  stradaptlve  tests'  item  pools.  The  Improved  conventional 
tests  consisted  of  items  with  discriminations  at  least  as  high  as,  and 
typically  higher  than,  those  in  the  adaptive  tests  and  were  the  same  length 
as  the  adaptive  tests.  No  response  pattern  information  function  was 
associated  with  the  Improved  conventional  test,  since  it  had  not  actually 
been  administered.  The  test  information  function  associated  with  the 
Improved  conventional  test  was,  therefore,  compared  to  the  response  pattern 
Information  function  associated  with  the  stradaptlve  test,  at  maximum 
test  lengths  of  30  and  20  items.  Results  Indicated  that  the  adaptive  test 
yielded  generally  higher  levels  of  information  than  the  Improved 
conventional  test. 

These  findings  indicate  that  adaptive  testing  not  only  was  superior 
to  typical  achievement  classroom  tests,  but  also  was  superior  to  a conventional 
test  which  was  designed  to  make  best  use  of  the  same  item  pool  to  measure 
individual  differences  in  achievement  levels  within  a specified  range. 

The  adaptive  test  both  provided  scores  of  higher  precision  and  reduced  the 
number  of  items  administered.  The  conclusion  derived  from  comparison  with 
the  Improved  conventional  test  is  conservative,  since  response  vector 
information  in  the  present  data  consistently  underestimated  test  Information. 

In  other  words,  had  the  improved  conventional  test  been  administered  and 
its  response  pattern  information  computed,  the  adaptive  test  with  a maximum 
length  of  20  items  would,  in  all  probability,  have  been  found  to  be 
substantially  more  informative. 

Contrary  to  previous  research  (Jensema,  1975),  it  was  found  that  an 
expanded  item  pool  can  Improve  the  precision  of  measurement  of  scores  derived 
from  it  by  adaptive  testing.  Jensema's  findings  were  based  on  a situation 
in  which  the  items  added  to  the  pool  were  Identical,  with  respect  to  all 
three  ICC  parameters,  to  the  items  already  in  the  pool.  The  results  of 
the  present  study  indicate  that  even  when  the  added  items  were  only  slightly 
more  discriminating,  the  addition  of  new  items  to  the  adaptive  testing  pool 
had  a fairly  substantial  effect,  globally,  as  well  as  at  most  levels  of 
achievement,  on  the  precision  of  measurement  of  scores  derived  from  the  pool. 
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Thls  Investigation  has  thus  shown  adaptive  achievement  testing  to 
be  a feasible  approach  to  the  measurement  of  achievement;  compared  to 
conventional  tests,  adaptive  achievement  testing  yields  considerably 
more  precise  estimates  of  achievement,  even  when  conventional  tests  are 
designed  to  take  maximum  advantage  of  the  Items  In  the  pool.  In  order 
to  exploit  the  advantages  of  adaptive  achievement  testing  to  Its  fullest, 
however.  It  will  be  necessary  to  build  a closer  psychometric  Interface 
between  Instruction  and  testing.  Reduction  In  testing  time  by  means  of 
adaptive  testing  Is  meaningless  If  the  result  Is  solely  early  dismissal 
from  examinations.  Rather,  what  Is  needed  Is  to  link  adaptive  testing 
with  an  adaptive  Instructional  context,  so  that  reductions  In  testing  time 
can  be  used  In  Increased  Instructional  activity. 

Atkinson  (1976)  has  described  several  examples  of  adaptive  computer- 
based  Instruction.  These  systems  are  adaptive  not  only  because  they 
sequence  Instruction  differently  for  each  student,  but  also  because  they 
differentially  allot  Instructional  time  to  students  In  order  to  maximize 
specified  objectives.  Differentially  allotting  Instructional  time  will. 

In  all  probability,  preserve  Individual  differences  In  achievement. 

This  approach  to  testing  and  Instruction  contrasts  with  the  current 
emphasis  on  mastery  learning  and  testing  (Block,  1971).  Mastery  testing, 
along  with  related  approaches.  Is  based  on  the  conception  that  If  Instructional 
time  Is  long  enough,  every  student  will  attain  the  same  degree  of  achievement. 
Although  this  may  be  true  In  principle,  an  increasing  amount  of  research 
suggests  that  individual  differences  persevere  even  when  instructional 
time  Is  allowed  to  vary  (Cronbach  6 Snow,  1977).  The  implications  for 
instruction  and  measurement  are  obvious:  An  unequivocally  useful  system  of 
adaptive  Instruction  and  achievement  testing  must  be  able  to  consider 
individual  differences  rather  than  attempt  to  create  student  homogeneity. 

It  seems  that  adaptive  testing  can  meet  that  challenge. 
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.00 

Mean  (F) 

1.08 

.09 

.29 

Mean*(W) 

1.17 

.09 

.28 

Stratua  4 
(19  Items) 


3744* 

1.94 

-.35 

.30 

3708* 

1.62 

-.20 

.16 

3631 

1.53 

t.18 

.35 

3814 

1.26 

-.32 

.35 

3903 

1.21 

-.43 

.31 

3671 

1.51 

-.14 

.26 

3701 

.82 

-.15 

.35 

3643 

1.40 

-.50 

.25 

3914 

.98 

-.39 

.16 

3693 

1.13 

-.24 

.24 

3725* 

1.09 

-.52 

.24 

3710* 

1.02 

-.33 

.30 

3653 

.83 

-.51 

.33 

3660 

.78 

-.39 

.14 

3922* 

.64 

-.26 

.30 

3606 

.71 

-.22 

.14 

3663 

.69 

-.17 

.33 

3696 

.68 

-.35 

.00 

3656 

.63 

-.31 

.34 

Mean(F) 

1.01 

-.31 

.25 

Mean*(U) 

1.08 

-.31 

.26 

(17  Itens) 


3634 

1.79 

-.58 

.30 

3739* 

1.68 

-.61 

.35 

3809 

1.27 

-.61 

.35 

3924* 

1.13 

-.79 

. 18 

3672 

1.57 

-.80 

.15 

3737* 

1.41 

-.66 

.34 

3915 

1.08 

-.61 

.16 

3640 

1.43 

-.69 

.35 

3906 

.87 

-.66 

.14 

3812 

.82 

-.63 

.13 

3682 

1.33 

-.72 

.34 

3637 

1.29 

-.73 

.28 

3636 

1.24 

-.63 

.27 

3641 

1.20 

-.65 

.22 

3711* 

1.05 

-.56 

.35 

3608 

1.04 

-.  78 

. 16 

3705* 

.87 

-.58 

.14 

Mean  (F) 

1.24 

-.67 

.24 

Mean*(W) 

1.25 

-.66 

.25 

Stratua  2 
(20  items) 


3735* 

1.63 

-.94 

.35 

3648 

1.59 

-.96 

.33 

3807 

1.52 

-1.10 

.17 

3907 

1.43 

-1.08 

.35 

3704* 

1.39 

-1.13 

.23 

3655 

1.37 

-.90 

.35 

3913 

1.20 

-.97 

.17 

3919* 

1.30 

-.98 

.21 

3680 

1.33 

-1.01 

. 16 

3806 

.99 

-1.00 

.30 

3686 

1.26 

-.88 

.29 

3721* 

1.23 

-1.20 

.22 

3821 

.90 

-.92 

.35 

3679 

1.21 

-.94 

.17 

3685 

1.19 

-1.01 

.16 

3668 

.97 

-.07 

.14 

3684 

.86 

-.85 

.14 

3703* 

.83 

-1,16 

.21 

3617 

.79 

-1.11 

. 14 

3713* 

.75 

-1.18 

.33 

Mean  (F) 

1.19 

-.97 

.23 

Mean*(W) 

1.19 

-1.01 

.24 

Stratua  I 
(19  items) 


3741* 

1.63 

-1.56 

.35 

3910 

1.58 

-1.59 

.21 

3692 

1.53 

-1.28 

.35 

3825 

1.09 

-1.38 

.34 

3639 

1.47 

-1.80 

.35 

3638 

1.35 

-1.54 

.21 

3913 

1.31 

-1.31 

.19 

3837* 

1.09 

-1.59 

.25 

3715* 

1.16 

-1.63 

.26 

3920* 

1.12 

-1.34 

.23 

3842* 

1.01 

-1.55 

.35 

3695 

1.09 

-1.73 

.22 

3731* 

1.05 

-1.67 

.35 

3832 

.99 

-1.74 

.32 

3838* 

.99 

-1.68 

.35 

3613 

.86 

-1.74 

.33 

3683 

.85 

-1.31 

.14 

3657 

.81 

-1.74 

.15 

3610 

.80 

-1.33 

.14 

Mean  (F) 

1.14 

-1.54 

.26 

Mean*(W) 

1.15 

-1.55 

.28 

Mote . Iteas  with  asterisks  are  those  which  were  added  to  the  pool  Winter  quarter.  All  other 
Iteaa  were  In  the  pool  both  Fall  and  Winter  quarters. 
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Table  C 

Item  Discrimination  (a).  Difficulty  (b) , and  Guessing 
(a)  Parameters  for  Classroom  Tests  FI  and  F2 


FI 

F2 

Item  No. 

a 

b 

a 

Item  No. 

a 

b 

a 

3060 

.86 

-1.31 

.29 

3922 

.bi* 

-.26 

.30 

3067 

1.07 

-.76 

.21 

3904 

2.45 

1.58 

.28 

3065 

1.17 

-1.66 

.35 

3918 

. 66 

.35 

.23 

3056 

.71 

.89 

.26 

3921 

.91 

1.23 

.29 

3063 

.91 

1.51 

.35 

3919 

1.30 

-.98 

.21 

3073 

1.43 

-1.57 

.31 

3920 

1.12 

-1.34 

.23 

3058 

1.05 

-.43 

.35 

3923 

.63 

.38 

.31 

3274 

.85 

-1.05 

.26 

3924 

1.13 

-.79 

.18 

3271 

.95 

1.32 

.30 

3801 

.80 

-.17 

.35 

3055 

1.71 

-.65 

.24 

3841 

.87 

2.13 

.35 

3072 

1.02 

.65 

.32 

3838 

.99 

-1.68 

.35 

3057 

1.20 

-1.35 

.26 

3833 

2.50 

2.85 

.35 

3064 

.94 

.86 

.24 

3837 

1.09 

-1.59 

.25 

3069 

.88 

-.01 

.35 

3835 

1.21 

2.28 

.35 

3054 

1.29 

-.93 

.31 

3641 

1.20 

-.65 

.22 

3066 

1.05 

.53 

.31 

3708 

1.62 

-.20 

.16 

3268 

.97 

-.28 

.18 

3718 

1.22 

.16 

.33 

3267 

1.02 

-1.22 

.23 

3728 

.91 

2.55 

.35 

3272 

1.06 

-.81 

.35 

3665 

1.19 

.54 

.22 

3070 

.95 

-1.28 

.22 

3730 

.75 

.01 

.10 

3008 

.96 

-1.75 

.18 

3719 

1.18 

1.08 

.31 

3019 

1.31 

.29 

.29 

3705 

.87 

-.58 

.14 

3062 

1.47 

.43 

.30 

3713 

.75 

-1.18 

.33 

3061 

.95 

1.57 

.30 

3703 

.83 

-1.16 

.21 

3262 

.81 

.47 

.35 

3709 

1.19 

.30 

.35 

3263 

.99 

2.29 

.35 

3707 

1.75 

.55 

.31 

3447 

1.18 

.93 

.32 

3721 

1.23 

-1.20 

.22 

3443 

1.07 

-1.64 

.35 

3717 

.83 

1.25 

.35 

3438 

.70 

.21 

.27 

3715 

1.16 

-1.63 

.26 

3448 

1.40 

.73 

.30 

3716 

1.14 

1.14 

.27 

3435 

.83 

-.61 

.35 

3720 

1.45 

.26 

.29 

3439 

1.36 

.64 

.32 

3744 

1.94 

-.35 

.30 

3436 

1.12 

1.59 

.35 

3745 

1.58 

-.07 

.20 

3449 

.91 

1.26 

.14 

3746 

1.59 

.43 

.30 

3440 

1.52 

2.00 

.30 

3711 

1.05 

-.56 

.35 

3437 

1.95 

.66 

.28 

3710 

1.02 

-.33 

.30 

3427 

.92 

1.51 

.26 

3724 

1.14 

.37 

.30 

3445 

1.19 

.44 

.34 

3725 

1.09 

-.52 

.24 

3444 

.88 

.78 

.35 

3731 

1.05 

-1.67 

.35 

3712 

.75 

1.64 

.30 

3704 

1.39 

-1.13 

.23 

Mean 

1.09 

.11 

.29 

Mean 

1.17 

.07 

.28 

▼ 
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Table  D 

Item  Discrimination  (a).  Difficulty  (b),  and  Guessing 
(c)  Parameters  for  Classroom  Tests  W1  and  W2 


W1 

W2 

Item  No. 

a 

b 

c 

Item  No. 

a 

h 

C 

3287 

.85 

-1.28 

.13 

3750 

.93 

-1.79 

.34 

3292 

.68 

1.39 

.35 

3926 

.93 

-1.56 

.16 

3219 

1.23 

.62 

.21 

3845 

1.71 

.26 

.29 

3290 

1.16 

-.57 

.20 

3763 

1.23 

1.95 

.28 

3214 

1.12 

.03 

.23 

3762 

1.97 

-1.56 

.17 

3268 

.97 

-.28 

.18 

3772 

.74 

-.84 

.35 

3289 

1.14 

-1.45 

.35 

3759 

.99 

-.14 

.21 

3293 

.96 

-1.30 

.14  ' 

3768 

1.11 

-1.55 

.17 

3291 

.65 

.52 

.35 

3756 

1.10 

-.21 

.28 

3249 

.91 

-1.69 

.17 

3749 

1.05 

-1.77 

.22 

3083 

1.05 

-.90 

.13 

3757 

1.18 

-1.60 

.18 

3090 

1.48 

-1.65 

.18 

3755 

1.03 

-.12 

,16 

3054 

1.29 

-.93 

.31 

3747 

1.11 

-1.69 

,18 

3084 

1.22 

-1.06 

.15 

3753 

.91 

-.55 

.17 

3092 

.98 

-.65 

.15 

3654 

1.51 

.84 

.21 

3082 

1.05 

2.27 

.35 

3673 

1.51 

1.11 

. 31 

3011 

1.32 

-.86 

.20 

3716 

1.14 

1.14 

.27 

3095 

.79 

-1.20 

.12 

3700 

.84 

.85 

,30 

3085 

1.16 

-1.81 

.35 

3773 

1.69 

1.62 

.27 

3423 

. 66 

.16 

.27 

3748 

.85 

1.31 

.35 

3453 

1.19 

.48 

.22 

3766 

1.12 

1.41 

.35 

3456 

1.03 

2.71 

.35 

3760 

1.28 

-1.58 

.18 

3454 

1.10 

2.66 

.35 

3758 

.89 

-1.45 

.15 

3460 

1.99 

1.59 

.34 

3703 

.83 

-1.16 

.21 

3452 

.75 

1,98 

.31 

3853 

1.05 

.12 

.17 

3406 

1.31 

2.48 

.35 

3854 

1.03 

-.19 

.31 

3461 

.94 

1.51 

.35 

3852 

.69 

-1.78 

.35 

3457 

,90 

1.87 

.28 

3850 

.89 

1.83 

.35 

3459 

.84 

-.29 

.26 

3851 

.76 

.18 

.23 

3407 

1.02 

2.41 

.29 

3752 

1.24 

-.50 

.19 

3458 

1.46 

-1.10 

.15 

3769 

1.15 

-.39 

.16 

3432 

1.72 

.67 

.35 

3751 

.80 

1.91 

.35 

3455 

.96 

-.61 

.31 

3770 

2.50 

1.73 

.00 

3420 

.68 

1.62 

.35 

3622 

.95 

2.53 

.35 

3433 

1.35 

.86 

.30 

3761 

.84 

1.27 

.32 

3412 

1.12 

.19 

.35 

3767 

1.02 

-.04 

.30 

3462 

1.31 

-1.03 

.17 

3930 

1.21 

-.44 

.35 

3285 

.79 

-.60 

.11 

3904 

2.45 

1.58 

.28 

3294 

.76 

-.68 

.19 

3918 

.66 

.35 

.23 

3041 

1.51 

.23 

.35 

3903 

1.21 

-.43 

.31 

3091 

1.64 

.58 

.30 

3928 

1.00 

.65 

.35 

3089 

.92 

-.37 

.30 

3929 

.96 

-1.76 

.22 

3093 

.75 

-.94 

.11 

3813 

1.20 

-.97 

.17 

3096 

1.48 

-1.48 

.16 

3927 

1.01 

-1.34 

.16 

3086 

.74 

-.67 

.35 

Mean 

1.09 

.08 

.25 

Mean 

1.14 

-.06 

.25 
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Table  F 

Theoretical  Test  Information  Values  for  First  and 
Second  Classroom  Tests  for  Fall  and  Winter  Quarters 


§ Test  1 Test  2 


Midpoint 

Fall 

Winter 

Fall 

Winter 

-1.90 

1.11 

1.89 

1.13 

2.26 

-1.70 

1.46 

2.55 

1.54 

2.86 

-1.50 

1.80 

3.18 

1.97 

3.26 

-1.30 

2.09 

3.70 

2.39 

3.41 

-1.10 

2.33 

4.04 

2.77 

3.41 

-.90 

2.52 

4.16 

3.11 

3.37 

-.70 

2.64 

4.09 

3.47 

3.38 

-.50 

2.66 

3.89 

3.86 

3.40 

-.30 

2.63 

3.67 

4.18 

3.42 

-.10 

2.66 

3.49 

4.33 

3.42 

.10 

2.83 

3.42 

4.31 

3.39 

.30 

3.13 

3.41 

4.18 

3.33 

.50 

3.46 

3.40 

3.94 

3.23 

.70 

3.63 

3.28 

3.57 

3.15 

.90 

3.55 

3.00 

3.13 

3.15 

1.10 

3.25 

2.66 

2.74 

3.35 

1.30 

2.89 

2.38 

2.56 

3.87 

1.50 

2.56 

2.20 

2.54 

4.57 

1.70 

2.28 

2.04 

2.37 

4.79 

1.90 

1.93 

1.84 

1.96 

4.13 
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Table  I 

Mean  Information  Divided  by  Mean  Number  of  Items 
at  Levels  of  9 for  the  Adaptive  Tests 


T 

6 

Midpoint 

Adaptive 

Fall 

Test  1 
Winter 

Adaptive 

Fall 

Test  2 
Winter 

-1.9 

.07 

.11 

.07 

.12 

-1.7 

.08 

.08 

.10 

.06 

-1.5 

.10 

.13 

.10 

.11 

-1.3 

.12 

.13 

.11 

.13 

-1.1 

.15 

.13 

.12 

.14 

-.9 

.13 

.16 

.12 

.12 

-.7 

.14 

.17 

.15 

.14 

-.5 

.17 

.16 

.11 

.13 

-.3 

.22 

.21 

.11 

.18 

-.1 

.20 

.19 

.12 

.16 

.1 

.18 

.17 

.12 

.17 

.3 

.16 

.18 

.12 

.16 

.5 

.16 

.17 

.11 

.16 

.7 

.16 

.18 

.13 

.16 

.9 

.18 

.17 

.13 

.15 

1.1 

.20 

.18 

.13 

.13 

1.3 

.23 

.19 

.12 

.11 

1.5 

.24 

.16 

.12 

.12 

1.7 

.15 

.18 

.13 

.11 

1.9 

.44 

.32 

.13 

.11 

Mean 

.17 

.17 

CM 

.15 
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